5,558 research outputs found

    deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks

    Get PDF
    With the advances in high-throughput technologies, millions of somatic mutations have been reported in the past decade. Identifying driver genes with oncogenic mutations from these data is a critical and challenging problem. Many computational methods have been proposed to predict driver genes. Among them, machine learning-based methods usually train a classifier with representations that concatenate various types of features extracted from different kinds of data. Although successful, simply concatenating different types of features may not be the best way to fuse these data. We notice that a few types of data characterize the similarities of genes, to better integrate them with other data and improve the accuracy of driver gene prediction, in this study, a deep learning-based method (deepDriver) is proposed by performing convolution on mutation-based features of genes and their neighbors in the similarity networks. The method allows the convolutional neural network to learn information within mutation data and similarity networks simultaneously, which enhances the prediction of driver genes. deepDriver achieves AUC scores of 0.984 and 0.976 on breast cancer and colorectal cancer, which are superior to the competing algorithms. Further evaluations of the top 10 predictions also demonstrate that deepDriver is valuable for predicting new driver genes

    A comparison of different cluster mass estimates: consistency or discrepancy ?

    Full text link
    Rich and massive clusters of galaxies at intermediate redshift are capable of magnifying and distorting the images of background galaxies. A comparison of different mass estimators among these clusters can provide useful information about the distribution and composition of cluster matter and their dynamical evolution. Using a hitherto largest sample of lensing clusters drawn from literature, we compare the gravitating masses of clusters derived from the strong/weak gravitational lensing phenomena, from the X-ray measurements based on the assumption of hydrostatic equilibrium, and from the conventional isothermal sphere model for the dark matter profile characterized by the velocity dispersion and core radius of galaxy distributions in clusters. While there is an excellent agreement between the weak lensing, X-ray and isothermal sphere model determined cluster masses, these methods are likely to underestimate the gravitating masses enclosed within the central cores of clusters by a factor of 2--4 as compared with the strong lensing results. Such a mass discrepancy has probably arisen from the inappropriate applications of the weak lensing technique and the hydrostatic equilibrium hypothesis to the central regions of clusters as well as an unreasonably large core radius for both luminous and dark matter profiles. Nevertheless, it is pointed out that these cluster mass estimators may be safely applied on scales greater than the core sizes. Namely, the overall clusters of galaxies at intermediate redshift can still be regarded as the dynamically relaxed systems, in which the velocity dispersion of galaxies and the temperature of X-ray emitting gas are good indicators of the underlying gravitational potentials of clusters.Comment: 16 pages with 7 PS figures, MNRAS in pres

    (Z)-3-(3-Phenyl­allyl­idene)-1,5-dioxa­spiro­[5.5]undecane-2,4-dione

    Get PDF
    In the title compound, C18H18O4, the 1,3-dioxane ring adopts a distorted envelope conformation with the C atom common to the cyclo­hexane ring forming the flap. In the crystal, inversion dimers linked by pairs of C—H⋯O hydrogen bonds occur

    Clustering Optimized Portrait Matting Algorithm Based on Improved Sparrow Algorithm

    Get PDF
    As a result of the influence of individual appearance and lighting conditions, aberrant noise spots cause significant mis-segmentation for frontal portraits. This paper presents an accurate portrait segmentation approach based on a combination of wavelet proportional shrinkage and an upgraded sparrow search (SSA) clustering algorithm to solve the accuracy challenge of segmentation for frontal portraits. The brightness component of the human portrait in HSV space is first subjected to wavelet scaling denoising. The elite inverse learning approach and adaptive weighting factor are then implemented to optimize the initial center location of the K-Means algorithm to improve the initial distribution and accelerate the convergence speed of SSA population members. The pixel segmentation accuracy of the proposed method is approximately 70% and 15% higher than two comparable traditional methods, while the similarity of color image features is approximately 10% higher. Experiments show that the proposed method has achieved a high level of accuracy in capricious lighting conditions

    Applications of graph theory in protein structure identification

    Get PDF
    There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given

    Identifying Disease-Gene Associations With Graph-Regularized Manifold Learning

    Get PDF
    Complex diseases are known to be associated with disease genes. Uncovering disease-gene associations is critical for diagnosis, treatment, and prevention of diseases. Computational algorithms which effectively predict candidate disease-gene associations prior to experimental proof can greatly reduce the associated cost and time. Most existing methods are disease-specific which can only predict genes associated with a specific disease at a time. Similarities among diseases are not used during the prediction. Meanwhile, most methods predict new disease genes based on known associations, making them unable to predict disease genes for diseases without known associated genes.In this study, a manifold learning-based method is proposed for predicting disease-gene associations by assuming that the geodesic distance between any disease and its associated genes should be shorter than that of other non-associated disease-gene pairs. The model maps the diseases and genes into a lower dimensional manifold based on the known disease-gene associations, disease similarity and gene similarity to predict new associations in terms of the geodesic distance between disease-gene pairs. In the 3-fold cross-validation experiments, our method achieves scores of 0.882 and 0.854 in terms of the area under of the receiver operating characteristic (ROC) curve (AUC) for diseases with more than one known associated genes and diseases with only one known associated gene, respectively. Further de novo studies on Lung Cancer and Bladder Cancer also show that our model is capable of identifying new disease-gene associations

    A Robust Quantum Random Access Memory

    Full text link
    A "bucket brigade" architecture for a quantum random memory of N=2nN=2^n memory cells needs n(n+5)/2n(n+5)/2 times of quantum manipulation on control circuit nodes per memory call. Here we propose a scheme, in which only average n/2n/2 times manipulation is required to accomplish a memory call. This scheme may significantly decrease the time spent on a memory call and the average overall error rate per memory call. A physical implementation scheme for storing an arbitrary state in a selected memory cell followed by reading it out is discussed.Comment: 5 pages, 3 figure

    Determination of the minimum number of microarray experiments for discovery of gene expression patterns

    Get PDF
    BACKGROUND: One type of DNA microarray experiment is discovery of gene expression patterns for a cell line undergoing a biological process over a series of time points. Two important issues with such an experiment are the number of time points, and the interval between them. In the absence of biological knowledge regarding appropriate values, it is natural to question whether the behaviour of progressively generated data may by itself determine a threshold beyond which further microarray experiments do not contribute to pattern discovery. Additionally, such a threshold implies a minimum number of microarray experiments, which is important given the cost of these experiments. RESULTS: We have developed a method for determining the minimum number of microarray experiments (i.e. time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. The key idea is a similarity measure for two clusterings which is expressed as a function of the data for progressive time points. While the experiments are underway, this function is evaluated. When the function reaches its maximum, it indicates the set of experiments reach a saturated state. Therefore, further experiments do not contribute to the discrimination of patterns. CONCLUSION: The method has been verified with two previously published gene expression datasets. For both experiments, the number of time points determined with our method is less than in the published experiments. It is noted that the overall approach is applicable to other clustering techniques
    corecore